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'^probability  diatribution  functions  are  presented  with  a  view  toward 
facilitating  the  aatheaetical  operations  which  frequently  occur  in 
practice. 


S/N  Ot03>LA-OU<««01 

McuaiTv  CkAUtriCATiON  or  TNit  rAatr*k«i 


1.  Introduction 

The  Harkovlan  "memoryless”  property  of  the  exponential 
probability  distribution  simplifies  many  conditional  probability 
calculations.  Exponential  functions  are  also  the  solution  to  many 
natural  elementary  differential  equations  which  model  a  broad  range 
of  applications.  As  such,  the  exponential  distribution  Is  a 
"workhorse"  of  applied  probability. 

A  first  attempt  to  generalize  the  exponential  family  was 
undertaken  by  Erlang  and  extended  by  Jensen  [1954].  Erlang 
considered  a  series  of  Independent  exponential  distributions  all 
with  the  same  parameter.  Then  Jensen  defined  the  generalized 
Erlang  (GE)  family  of  probability  distribution  functions,  which 
have  the  Interpretation  that  the  modeled  process  consists  of 
successive  Independent  stages  each  having  an  exponential 
distribution.  The  resultant  mathematical  operation  Is  seen  to  be  a 
convolution  since  the  duration  Is  distributed  as  the  sum  of 
several,  not  necessarily  Identical,  exponential  random  variables. 

The  GE  family  Increased  modeling  flexibility,  but  has  the 
restrictive  property  that  all  GEs  have  a  coefficient  of  variation 
less  than  one.  To  create  probability  distribution  functions  whose 
coefficient  of  variation  exceeds  unity,  one  considers  mixtures  In 
contrast  to  convolutions  of  exponential  distributions.  This  leads 


to  the  hyperexponentlal  family  of  probability  distribution 
functions. 

Schassberger  [1970]  showed  that  a  sequence  of  mixed  Erlangs 
can  be  found  which  will  converge  weakly  (that  Is,  It  converges  at 
every  point  of  continuity)  to  any  arbitrary  distribution 
function.  In  the  sense  of  polntwlse  convergence  at  points  of 
continuity,  we  can  then  say  that  the  class  of  mixed  generalized 
Erlangs  (MGEs)  Is  dense  In  the  class  of  all  distribution 
functions.  The  denseness  of  this  family  gives  an  Indication  of  the 
theoretical  comprehensiveness  of  the  MGEs  as  a  practical  modeling 
tool. 

In  parallel  with  the  above  developments,  computational 
techniques  related  to  transform  methods  from  complex  analysis  have 
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emerged.  While  the  laterpretaCion  of  transform  methods  is  not 
always  clear,  the  methods  frequently  reduce  computational  effort. 
The  Laplace  transform  of  the  exponential  is,  of  course,  the 
reciprocal  of  a  flrst~degree  polynomial.  Smith  [1953]  considered 
the  family  of  distributions  whose  transforms  are  the  reciprocal  of 
an  n^^  degree  polynomial  (K^).  This  is  a  natural  extension  of  the 
generalized  Erlang  in  transform  space.  It  further  admits  of  the 
interpretation  as  successive  exponential  stages  in  which  the 
exponential  parameters  are  possibly  complex. 

The  inverse  polynomial  approach  was  extended  to  the  rational 
transform  (R^)  case  by  Cox  [19SSJ.  He  showed  that  the  formal 
solution  methods  are  still  valid,  even  though  the  intermediate 
interpretation  is  a  bit  awkward.  The  exponential  stages  may  have 
complex  valued  parameters  and  the  "probabilities”  associated  with 
the  mixture  interpretation  may  be  negative.  It  is  worth  observing 


that  the  set  of  distribution  functions  with  rational  transforms  is 


dense  in  the  set  of  distribution  functions. 


While  the  method  of  rational  transforms  may  lead  to  a 
distribution  function  which  is  as  close  as  desired  to  an  arbitrary 
distribution  function,  the  degree  of  the  denominator  polynomial  may 
be  quite  high.  From  a  computational  viewpoint,  determining  all  the 
roots  to  a  reasonable  degree  of  accuracy  may  not  be  possible. 

Hence,  alternative  methods  with  reduced  computational  complexity 


are  needed. 


Neuts  [e.g.,  1981]  examined  the  phase-type  (PH)  class  of 
distribution  functions,  defined  as  the  times  until  absorption  in  a 
finite-dimensional  Markov  Chain.  While  this  family  of 
distributions  is  not  as  comprehensive  as  the  family,  it  is  still 
dense  in  the  space  of  all  distribution  functions.  It  also  has  the 
computational  advantage  that  matrix-vector  procedures  in  the  real 


domain  are  used  in  lieu  of  transform  methods. 


The  generalized  hyperexponential  (GH)  family  of  probability 
distribution  functions  is  examined  in  detail  in  this  paper.  The  GH 
family  has  the  same  computational  advantage  over  transform  methods 
that  the  PH  family  has,  namely,  avoidance  of  complex  arithmetic. 
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We  will  see  later  that  It  also  has  some  other  advantages  over  PH 
representations*  The  GH  family  of  probability  distribution 
functions  Is  partially  motivated  as  an  extension  of  the  notion  of  a 
mixture  of  exponentials,  where  we,  as  Cox  did  earlier,  permit  the 
Intermediate  mixing  "probabilities*'  to  have  negative  values. 

Botta  and  Harris  [1986]  demonstrated  the  denseness  of  the  GH 
family  In  the  class  of  all  distribution  functions.  The  present 
paper  addresses  several  additional  features  of  the  GH  family  which 
recommend  It  as  a  modeling  distribution  with  high  computational 
tractablllty.  A  major  feature  of  the  GH  distribution  function  is 
its  uniqueness  of  representation,  which  Is  discussed  In  detail  In 
the  latter  part  of  Section  2.  Section  2  also  provides  a  complete 
picture  of  the  position  of  the  GH  family  relative  to  the  other 
families  mentioned.  Six  closure  properties  of  the  family  which 
relate  to  the  mathematical  operations  frequently  performed  In 
applied  probability  are  discussed  In  Section  3.  A  final  section 
Illustrates  some  characteristics  of  the  GH  family  which  facilitate 
numerical  inversion  for  the  determination  of  Its  quantiles.  In 
addition,  the  related  problem  of  random  variable  creation  Is 
addressed,  using  a  special  acceptance-rejection  algorithm. 
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2.  Relations  Among  Classes  of  Distribution  Functions 

Families  of  probability  distribution  functions  that  find  wide 
use  as  approximations  to  more  general  CDFs  are  defined  and  related 
to  one  another  In  this  section.  The  more  obvious  relations  are 
mentioned  with  the  definitions^  while  others  are  presented  In 
following  sections.  Several  of  the  definitions  below  are  stated  In 
terms  of  the  one-sided  Laplace-Sttelt jes  transform  of  a  CDF,  F. 

This  transform,  F*,  Is  defined  as 

F*(8)  -  /  e’®*"  dF(t), 

0 

which  Is  equivalent  to  the  ordinary  one-sided  Laplace  transform  of 
a  PDF,  f(t)  ~  dF/dt,  whenever  F(t)  Is  absolutely  continuous. 


a 


i 
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2.1  Definitions 

K  Class 
n _ 

Smith  [1953]  defined  the  class  to  be  those  distribution 
functions  whose  Laplace  transforms  are  the  reciprocals  of 
polynomials  of  the  n**^  degree.  Not  all  reciprocal  polynomials  are 
transforms  of  CDFs.  For  example,  the  real  part  of  each  polynomial 
root  must  be  negative,  and  while  the  roots  may  be  complex,  they 
must  occur  In  conjugate  pairs  since  the  corresponding  CDF  Is 
real.  There  are  also  additional  constraints  that  are  not  so 
obvious.  Lukacs  and  Szasz  [1951]  have  shown  that  one  of  the  roots 
with  greatest  real  part  must  be  real.  Therefore,  the  simplest 
member  of  having  complex  roots  Is  of  the  form 

a(a^  +  b^) 

F*(s) - 2 - 2-  , 

(8  +  a)[(8  +  a)  +  b  ] 

corresponding  to  the  PDF 

f(t)  -  ab“2(a2  +  b2)e"**^(l  -  cos  bt)  (a  >  0).  (2.1.1) 
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The  exponential  distribution  belongs  to  K„.  Since  the  Laplace 
transform  of  the  distribution  of  a  sum  of  Independent  random 
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variables  Is  the  product  of  the  Laplace  transforms  of  their 
Individual  distributions,  the  generalized  Erlang  CDFs  corresponding 
to  a  sum  of  Independent,  exponentially  distributed  random  variables 
with  distinct  parameters  are  also  In  These  generalized 

Erlangs,  denoted  GE,  have  transforms  of  the  form 


s  +  X. 


>  0) 


where  X^/Cs  +  X^)  Is  the  transform  of  an  exponential  COF  having 
mean  1/X^.  If  all  the  random  variables  are  Identically 
distributed,  the  resulting  distribution  Is  the  (simple)  Erlang  of 
degree  n,  Ejj(X),  and  Its  Laplace  transform  Is  just  X'^/Cs  +  X)°. 
Therefore,  we  see  that  Eq(X)  c  and 

GEC  Kq  •  (2.1.2) 

R  Class 
n 

While  contains  GE,  It  does  not  contain  mixtures  of  GE  CDFs, 
l.e.,  distributions  of  the  form  ^a^F^  with  a^  >  0,  la^  *  1  and  F^  e 
GE.  Suppose,  for  example,  each  F^  Is  exponential.  By  the 
linearity  of  the  Laplace  transform,  the  transform  of  ^a^Fjj^  Is 


1-1  s  +  X^ 

When  combined  Into  a  single  fraction,  a  quotient  of  two  polynomials 
results,  the  degree  of  the  denominator  being  n  and  the  degree  of 
the  numerator  n  -  1.  This  motivates  the  definition  of  as  the 
class  of  distributions  whose  transforms  are  rational.  The  Index  n 
Is  the  degree  of  the  denominator  polynomial.  Hence,  the  class  of 
mixed  generalized  Erlang  distributions,  denoted  by  MGE,  Is 
contained  in  Rj^.  Cox  [1955]  points  out  that  both  the  convolution 
and  the  mixture  of  any  pair  of  distributions  In  R^  yields  another 
distribution  with  rational  Laplace  transform.  Furthermore,  all 
distributions  In  R^  are  continuous  except  for  possible  atoms  at  the 
orlglr  ind  the  corresponding  density  function  is  positive  every¬ 
where  In  (0,-)  except  at  Isolated  points.  Finally,  one  sees  that 
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(2.1.3) 


PH  Class 


Neuts  [1975,  1981]  has  popularized  a  class  of  distribution 
functions  known  as  "phase"  type,  or  PH,  distributions.  A  CDF  Is  of 
phase  type  If  It  can  be  Interpreted  as  the  time  until  absorption  In 
a  finite-state  continuous-time  Markov  chain.  That  Is,  F  Is  phase 
type  If  It  can  be  written  as 

F(t)  -  1  -  o  •  e^*^  •  ±  (2.1.4) 

where  Q  Is  the  generator  matrix  and  has  the  form 


-‘‘11  ‘*12***  **10 

‘*21  ■‘*22***  ‘*2n  ^‘*ii  ^  ‘*ij  *  ^  H 

J  ..  _„*•  ■‘*11  *  ^  qij  <  0,  1  -  l,2---,n). 

‘*nl  ‘*n2  ‘*nn  J**- 


This  generator  matrix  corresponds  to  an  (n  +  l)-8tate  Markov  chain 
with  absorbing  state  (n  +  1).  The  vector  _a  ■  (oj^,ot2,  •••,a^^)  Is 
the  vector  of  Initial  state  probabilities  at  t  ■  0,  and  the  vector 
e^  Is  an  n-dlmenslonal  column  vector  of  all  ones.  The  entries, 

In  the  generator  matrix  represent  the  Instantaneous  rate  of  the 
transition  from  state  1  to  state  J.  Each  component  of  e*^^  •  ^ 
corresponds  to  a  phase-type  distribution  that  results  from  starting 
In  a  particular  state.  Therefore,  (2.1.4)  can  be  Interpreted  as  a 
mixture  of  phase-type  distributions,  that  Is, 

F(t)  -  I  Oj[l  -  (eQ*"  •  e)^]. 

Two  examples  of  distribution  functions  with  PH  representations 
follow. 

Example  2.1.1  The  GE  distribution  of  order  n  with  parameters 
^l»^2>*'*»^n  *'®®  ^*'®  representation^*  (1,0,0, •••,0)  and 


-Xj  Xj  0*- 

0  -X2  X2' 


0  o»»»»-x  ,x  , 

n-1  n-1 

0  0 . 0  -X 


TOl 


Example  2.1.2  The  mixed  exponential  distribution 


F(t)  -  I  a.(l  -  e'^*") 
i«l 

has  the  representation  a  -  (o^  ,oi2>  *••»“„)  a  diagonal  Q  matrix 
with  elements  -X^ . 

Notice  that  PH  representations  are  not  unique.  That  is,  there 
may  exist  many  different  generator  matrices  of  different  orders 
that  lead  to  the  same  COF.  (We  provide  an  example  later  in  Section 
2.5.)  The  problem  of  finding  minimal  representations  of  PH 
distributions  (that  Is,  where  the  order  of  Q  is  as  small  as 
possible)  Is  an  open  question.  Neuts  [1981]  did  show  that  the 
class  of  PH  distributions  Is  closed  under  convolution  and  finite 
mixtures  but  not  under  infinite  mixtures.  It  follows  that  MGE 
distributions  are  phase  type,  l.e., 

MGEC  PH. 

The  representation  (2.1.4)  of  a  PH  distribution  was  obtained 
from  the  distribution  functions,  _v(t),  of  the  Individual  states  of 
the  underlying  Markov  chain  which  are  the  solutions  of 

dv(t) 

-  -  v(t)-Q.  (2.1.5) 

The  solution  to  this  equation  Is  ^(t)  ■  ^(0)0^*"  ■  ^  e^^.  Taking 
the  Laplace  transform  of  (2.1.5)  yields 

sV*(s)  -  _v(0)  »  V*(s)*Q  , 

V*(s)»(sl  -  Q)  ••  jKO)  ^  a 

V*(s)  -  a  (si  -  Q)"^  . 

Thus  (si  -  Q)~^  Is  the  Laplace  transform  of  e^^,  and  each  term  in 
the  Inverse  matrix  of  si  -  Q  is  a  rational  expression. 


so  that 


or 
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Multiplication  by  ^  yields  rational  expressions  for  each  component 
of  V*(s).  Therefore,  the  probability  distribution  of  each  state 
belongs  to  as  does  the  distribution  of  the  time  until 
absorption.  Therefore,  we  see  that 

PHC  Rti  •  (2.1.6) 

Phase-type  distributions  exist  which  possess  Laplace 
transforms  that  are  not  reciprocal  polynomials,  so  that  PH(^  K^. 

But  it  is  not  possible  for  every  distribution  to  have  a  PH 
representation.  Corollary  2.2.1  in  Neuts  [1981]  proves  that  any 
non-trlvial  PH  distribution  has  a  corresponding  density  function 
that  is  strictly  positive  for  all  t  >  0.  The  PDF  given  in  (2.1.1) 
has  a  reciprocal-polynomial  Laplace  transform  but  the  density 
function  is  zero  wherever  cos  bt  =  1.  Therefore,  the  corresponding 
distribution  function  is  not  in  PH  and  which  implies  that 

PH  and  that  PH  is  thus  a  proper  subset  of  R^.  Observe  that, 
for  an  arbitrary  CDF,  there  is  no  easy  way  to  determine  if  it  is  in 
PH.  One  must  search  for  a  suitable  generator  matrix  and  set  of 
initial  conditions  that  will  yield  the  desired  distribution. 
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GH  Class 


The  generalized  hyperexponential  distributions  are  CDFs  of  the 


form 


,  r  -X.t 
1  -  1  a  e  i 

i-1 

n 

with  and  a^  real,  X,  >  0  and  a  •  1.  Unlike  the  usual 

^  i-1 

hyperexponential  distribution,  we  do  not  require  that  each  a^  be 
nonnegative.  This  added  freedom  makes  the  GH  distributions 
extremely  versatile.  Indeed,  Botta  and  Harris  [1986]  derived  the 
critical  characterization  that  any  CDF  on  [0,“)  can  be  approximated 
by  a  member  of  GH  as  closely  as  desired  with  respect  to  an 
appropriate  metric. 

The  Laplace  transform  of  a  GH  distribution  is 
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1-1  8  + 

so  one  notes  that 

GHCRji  (2.1.7) 

But  not  all  functions  of  the  form  1  “  I  a.e  1*"  with  X,  >  0  and 

1-1  ^ 
n 

^  a.  -  1  are  GH  distributions.  We  do  know,  for  example,  that 
1-1  ^ 

n 

the  CDF's  monotonlclty  requires  that  ^  a. X.  >  0.  Also,  assuming 

1-1  ^ 

Xg  to  be  the  smallest  of  the  Xj^,  the  corresponding  coefficient  a^ 
must  be  positive  to  insure  proper  asymptotic  behavior  as  t 
Bartholomew  [1969]  derived  a  number  of  sufficient  conditions  for  a 
linear  combination  of  exponentials  to  be  a  GH  distribution,  but  no 
set  of  conditions  that  are  both  necessary  and  sufficient  is 
known.  Dehon  and  Latouche  [1982]  have  recently  characterized  the 
class  of  GH  distributions  by  deriving  a  parametric  equation  of  the 
boundary  of  the  convex  region  constituting  GH  for  the  case  n  -  3. 
The  geometric  representation  Is  obtained  by  choosing  a  set  of  basis 
vectors  from  the  class  of  all  GH  distributions  composed  of  linear 
combinations  of  three  exponentials.  It  does  not  appear  that  the 
boundary  equation  can  be  easily  used  to  determine  if  a  candidate 
exponential  sum  Is,  In  fact,  in  GH.  For  sums  of  more  than  three 
exponential  terms,  the  boundary  equation  could  be  determined  In 
similar  fashion  but  would  be  very  Involved  and  still  not  of  much 
practical  use  In  determining  membership  In  GH. 

We  next  develop  some  additional  relations  among  the  classes 
K^,  R^,  GE,  MGE,  PH,  and  GH. 

2.2  GH  and  PH 

From  the  preceding,  we  know  that  all  PH  distributions  are  In 
Rjj.  From  the  discussion  leading  up  to  (2.1.6),  It  Is  clear  that  If 
the  generator  matrix  has  distinct  real  eigenvalues,  then  the 
corresponding  PH  distribution  also  will  be  In  GH.  But  if  the 
denominator  polynomial  has  repeated  or  complex  roots,  the 


*  .  ■  ej 


E 


corresponding  distribution  will  not  belong  to  GH.  The  following 
example  displays  such  a  PH  distribution. 

Example  2.2.1  Consider  the  3x3  generator  matrix 


‘-1  1  o' 

1-21. 

_  1  0-3, 


The  eigenvalues  of  Q,  which  are  equal  to  the  roots  of  the 
denominator  polynomial  of  the  Laplace  transform  of  e^^,  are 


-  -.2307  ;  -  -2.8846  ±  .5897  1 


where  1  ~  /^.  The  resulting  PH  distribution  corresponding  to  an 
initial  state  vector  ^  *  (1,0,0)  Is 

F(t)  -  1  -  1.1729 

-  (.1729  cos  .5897t  +  .3868  sin  .5897t] 

Because  of  the  trigonometric  terms,  F(t)  Is  clearly  not  In  GH  so 

PH  GH  . 

But  not  every  GH  distribution  has  a  PH  representation.  As 
mentioned  earlier,  the  density  function  corresponding  to  any  PH 
distribution  Is  strictly  positive  for  all  t  >  0.  The  following 
example  exhibits  a  GH  distribution  that  violates  this  condition. 

Example  2.2.2  Consider  the  GH  distribution  defined  by 

F(t)  -  1  -  (4e"‘^  -  6e“^^  +  3e'’^‘^) 

with  corresponding  density 


f(t)  -  F'(t)  -  4e~^  -  12e"^’^  +  9e'^^ 
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It  can  easily  be  shown  that  £(t)  -  0  for  t  ~  In  (3/2)  and  that  £(t) 
>  0  for  all  other  values  of  t.  Therefore,  F(t)  Is  not  PH  and 

GH^  PH 


2.3  MGE  and  GH 

Recall  that  the  generalized  Erlang  (GE)  distributions  have 
Laplace  transforms 

1-1  ■'  ■  "1 

where  the  are  distinct.  Using  a  partial  fraction  expansion, 
this  transform  can  be  written  as 


A 

s  +  X 


where  the  are  real.  Any  mixture  of  such  distributions  has  a 
transform  of  the  same  form.  So,  any  mixed  generalized  Erlang 
distribution  Is  In  GH  and 


.  MGEC  GH  .  (2.3.1) 

Based  upon  results  In  Dehon  and  Latouche  [1982],  we  next 
demonstrate  the  existence  of  GH  distributions  that  cannot  be 
represented  as  MGEs  of  the  same  order.  They  show  that  any  GE 
distribution  constructed  from  a  subset  of  exponential 
distributions,  (Fj},  can  be  expressed  as  a  mixture  of  the  GE 
distributions  Fj^,  Fj^2i**’»  ^i2***n  ^12»*»i  convolution 

of  the  first  1  exponential  distributions  from  {Fj}.  Each  such 
distribution  function  can  be  written  as 


j-1 


1 

TT 

k-l 


X  -  X, 
k  j 


Fj(t) 


(t  >  0), 


(2.3.2) 


L  J 

where  Fj(t)  -  1  -  e“^j^  and  It  has  been  assumed  without  loss  of 
generality  that  Xj^  >  X2  >•••>  X^.  Since  the  {Xj}  are  constants, 
(2.3.2)  is  In  the  form  of  a  GH  distribution  whose  coefficients  are 
determined  by  the  fXj},  which  agrees  with  (2.3.1).  In  order  for  a 
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GH  distribution,  F(t)  ■  1  ~  I  a^e  1^,  to  have  a  MGE 
representation,  there  must  exist  a  set  of  nonnegative  numbers  {b^, 
1  >  l,2,***,n}  which  sum  to  one  and  satisfy  the  equation 


1  -  I  a  e"  1*=  -  I  — 

1“1  ^  1*1 


(2.3.3) 


By  substituting  (2.3.2)  Into  (2.3.3),  collecting  like  terms  and 
then  equating  coefficients  of  each  exponential  term  on  the  left  and 
right  sides  of  (2.3.3),  the  following  triangular  system  of  linear 
equations  relating  the  {a^}  and  {bj^}  results: 


n  k  X, 

‘l  -  bl  +  I  bfe  TT  x-^  * 

^  k-2  j-2  '^1 


il  -  I  \  rr  (1-2,3, “sn). 

k-1  J-1  *1 

j*i 


(2.3.4) 


This  system  of  equations  Is  readily  Inverted  to  yield  the  {b^}  In 
terms  of  the  {a^}  as  follows: 


'■i  ■  J,  ^  J,  Vk- 

k-1  1  k-1 


(2.3.5) 


'*1  "  ^  — i -  (1-2,3,  •••,n). 

rr  A, 


For  the  case  n  -  3,  the  above  system  of  equations  becomes 


ai  -  bi  + 


'2  ''1 


'2  (X^  -  X^)(X3  -  X^)  "3  • 


I 


•is 

if 

% 


I 


*5, 

ft* 

'•t 

A' 


I 


S' 

i 

S^r 

.J 

*'t' 


i 


■j- 

;? 


S 

il 


1^ 


X^X^ 

*2  •  Xj  -  T"  **2  (X^  -  X^XXj  -  X^)  ‘*3  ’ 


(2.3.6) 


(X3  -  X^)(X3  -  X2)  3 


and  thus 


bi  -  Si  + 


X1.X2 


^3^^l  ”  ^3^ 


(Xj^  —  X3)(X2  ~  X3) 


(2.3.7) 


From  (2.3.7)  63  Is  guaranteed  to  be  nonnegative  since  a3  >  0  for 
F(t)  to  be  a  distribution  function  and  the  X^  are  positive  with 
Xf  >  X2  •••  >  X^.  The  nonnegativity  of  bj^  follows  from  noting 
that  bj^  can  be  written  as  (l/Xj^)  F'(0)  where  F'(t)  ■  ^  F(t). 
Since  F'(t)  is  the  PDF  corresponding  to  F(t),  it  must  be 
nonnegative  for  all  t.  Requiring  that  b2  >  0  leads  to  the 


condition 


*2  "' 


X3(X^  -  X3) 


(2.3.8) 


The  next  example  demonstrates  that  there  exist  GH 
distributions  for  which  condition  (2.3.8)  is  violated. 

Example  2.3.1  Consider  the  GH  CDF 

F(t)  -  1  -  (ee”^’^  -  13e"^*^  +  8e"2t). 

Here 

aj^  ■  6,  33  *  “13,  a3  ■  8 

and 

^1  "  ^2  "  ^3  " 


Therefore 

^3(^1  ■  32 

■  “3 "  ■  -j  • 

Since  82  <  -32/3,  we  see  that  (2.3.8)  Is  violated  and  thus  that  no 
HGE  representation  exists  for  F(t).  This  example  establishes  that 

GH  ^  MGE  , 

and  that  the  class  of  MGE  distributions  Is  thus  a  proper  subset  of 
the  class  of  GH  distributions.  However,  It  Is  sometimes  possible 
to  obtain  a  MGE  representation  by  embedding  the  problem  In  a 
higher-order  space  even  when  there  Is  no  valid  MGE  representation 
in  the  original  space.  An  Illustration  Is  provided  later  In 
Section  2.5  as  Example  2.5.1. 

2.4  MGE  and  PH 

He  established  In  Section  2.1  that  all  MGE  distributions  are 
phase  type.  Since  PH  distributions  may  include  trigonometric 
terms,  It  is  clear  that  the  MGE  distributions  are  a  proper  subset 
of  PH.  But  what  If  the  PH  generator  matrix  Is  allowed  to  have  only 
real  eigenvalues?  Is  the  resulting  subclass  of  PH  distributions 
contained  In  MGE?  The  answer  Is  no.  We  obtain  this  result  by  way 
of  a  counterexample. 


Example  2.4.1  The  PH  distribution  given  by 

F(t)  -  1  -  (1.293  -  .343  +  ,o50  e“*^59tj^ 

(2.4.1) 


was  obtained  from  the  generator  matrix 


Q  - 


1/8 

0 

-1 


(2.4.2) 


with  a  ■  (1,0,0).  As  before,  equating  F(t)  to  b2^Fj(t)  +  b2F2^2(t)  + 
b3Fi23(t)  and  solving  for  the  {bj^}  yields  the  result  that  b2  * 
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I 

I 


s 


i 


s 


-.0369.  Since  each  must  be  nonnegatlve,  we  do  not  have  a  valid 
MCE  representation.  Thus,  PH  distributions  with  real  roots  do  not 
necessarily  belong  to  MGE.  In  other  words, 

PH  (real  roots)  MGE 
and  MGE  is  a  proper  subset  of  PH  (real  roots). 


2.5  Uniqueness  of  Representation 

For  statistical  applications,  an  important  property  of 
mixture-type  CDFs  is  uniqueness  of  representation,  or 
Identlflablllty.  Yakowltz  and  Spraglns  [1968]  define  the 
ident if lability  of  finite  mixtures  as  follows.  If  ^ 

collection  of  CDFs,  then  the  class  of  finite  mixtures  of 
the  {f^}  is  said  to  be  Identifiable  if  the  convex  hull  of 
{F^}  has  the  property  that 


N 

I 

i-1 


*^1^1 


M 

I 

1-1 


where  c^>0,  ^c^-1,  implies  N  -  M  and  that  for  each 
i  (1  <  i  <  N)  there  is  some  J  (1  <  j  <  N)  such  that  Cj^ 

f  I 

A  necessary  and  sufficient  condition  for 


Cj  and  Fj^ 


y 


Identlf lability  is  that  the  class  {f^}  be  a  linearly  Independent 
set  over  the  field  of  real  numbers.  This  follows  from  the 
uniqueness  of  representation  property  of  a  basis  in  a  vector  space. 

Since  any  collection  of  distinct  exponentials  is  linearly 
Independent,  the  class  of  finite  mixtures  of  exponential  CDFs  is 
identifiable.  A  broader  concept  of  identlflablllty  for  generalized 
mixtures  also  applies  when  the  underlying  family  of  CDFs  is 
exponential.  A  generalized  mixture  is  one  where  the  mixing 
parameters  sum  to  unity  but  can  have  any  real  values;  the  GH 
distributions  are  of  this  form.  Again,  the  uniqueness  of  the 
representation  of  vectors  with  respect  to  a  basis  for  the  vector 
space  implies  that  GH  distributions  have  unique  representations  as 
linear  combinations  of  exponentials. 


i 


Importantly,  the  other  families  of  CDFs  considered  in  this 
work  do  not  share  the  uniqueness  of  representation  property  with 
the  distributions.  For  example,  consider  the  following  two 
distinct  phase-type  representations: 

■  -3  1  1  I 

Q  -  1  -4  2  ,  a  -  (0,1/2, 1/2) 

_  1  0  -6  J 

and 

q'  -  [‘J  _5  ]  .  o’  -  (2/3, 1/3)  . 

Clearly  the  two  representations  are  different  and  are  not  of  the 
same  order.  However,  each  results  in  the  same  CDF,  namely, 

F(t)  “  1  -  (2/3)  e”^^  -  (1/3)  e“^^.  The  second  representation  is 
of  minimal  order  since  the  CDF  is  a  mixture  of  two  exponentials. 

Mixed  generalized  Erlang  distributions  also  permit  multiple 
representations.  From  the  notation  of  Dehon  and  Latouche  [1982]  we 
may  represent  the  CDF  of  the  sum  of  n  independent  random  variables, 
each  exponentially  distributed  with  parameter  1^^  (i  >■  l,2,***,n), 
by  F, .  .  Now  consider  the  two  CDFs  defined  by 

F(t)  -  (1/3)  Fj  +  (2/3)  Fi3 

and 

G(t)  -  (1/3)  Fi  +  (4/9)  Fi2  +  (2/9)  F123  • 

That  these  two  CDFs  are,  in  fact,  the  same  can  be  seen  by  express¬ 
ing  each  as  a  linear  combination  of  the  underlying  exponential 
distributions.  The  following  unique  representation  is  obtained: 

F(t)  -  G(t)  -  (-1/3)  Fi  +  (4/3)  F3  . 

As  in  the  PH  example,  one  of  the  MGE  representations  is  not  of 
minimal  order. 
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For  most  applications,  such  as  curve  fitting,  non-uniqueness 
of  representation  Is  a  disadvantage.  But  obtaining  a 
representation  of  non-mlnlmal  order  sometimes  may  be  useful.  For 
example,  suppose  we  have  a  GH  distribution  that  does  not  have  an 
MCE  representation  of  minimal  order.  It  may  be  possible  to  embed 
the  distribution  In  a  higher-order  space  In  such  a  way  that  an  MCE 
representation  Is  obtained.  We  Illustrate  the  procedure  via  an 
example . 

Example  2.5.1  Consider  the  GH  distribution 


P(t) 


-7t 

e 


35  -3t  ^  21  -It. 

-4  e  +-50  ) 


Here  Xj^“7,  *  3,  X^*2.  Dehon  and  Latouche  [1982] 

established  that  an  )^E  representation  exists  If,  and  only  if, 
there  exists  a  set  of  coefficients  (b^^,  1  *  1,2, 3, 4}  such  that 


F(t)  ■  biFi  +  b2Fi2  +  '»3'^123  '*4^1234 

with  the  {b^}  nonnegative  and  summing  to  one.  It  can  be  shown  that 
such  a  set  of  coefficients  does  not  exist  (b3  Is  negative).  Let  us 
now  add  an  additional  exponential  term,  e~^^,  and  write 


F(t)  -  1  -  <-  xf  +  0  ^  - 

Here,  X^  -  7,  X^  -  6,  X^  -  4,  X^  -  3,  X^  -  2. 
for  the  coefficients  from 


35  -3t  .  21  -2t. 

-4®  ■"-5® 

We  must  now  solve 


F(t) 


b'F' 

®1*1 


b'F’ 

02*12 


4*^123 


'’4^1234 


b'F' 

®5' 12345 


where  the  primes  Indicate  that  the  corresponding  terms  are  defined 
with  respect  to  the  {X|}.  It  turns  out  that  there  is  a  solution 
for  the  {b|}  that  results  In  the  representation 


F(t)  “  5-  +  j  ^{2  27  ^123  27  ^1234  I  ^12345  * 


Not  only  does  this  give  us  an  MGE  representation.  It  also  confirms 
that  the  original  F(t)  Is,  In  fact,  a  valid  CDF  since  It  can  be 
expressed  as  a  mixture  of  CDFs. 

Since  all  HGEs  are  of  phase  type  and  there  exist  GHs  that  are 
not  members  of  PH  (see  Section  2.2),  It  Is  not  possible  to  obtain 
an  MGE  representation  for  every  GH  distribution.  A  more  complete 
discussion  of  the  representation  of  GH  distributions  as  MGEs, 
Including  a  set  of  necessary  and  sufficient  conditions  that  does 
not  require  solving  for  the  tb^}  coefficients.  Is  contained  In 
Botta  [1986].  The  uniqueness  property  provides  a  strong  rationale 
for  our  Interest  In  the  GH  class  of  distributions.  It  Is 
unfortunate  that  the  PH  family  of  distributions  does  not  have  a 
corresponding  uniqueness  property. 

2.6  Summary  of  Set  Inclusion  Relations 

The  results  of  the  foregoing  sections  yield  the  following  set 
of  relations  among  the  classes  of  distribution  functions: 

(1)  GE  C  C  Rn 

(2)  GE  C  MGE  C  GH  C  R„ 

(3)  GE  C  MGE  C  PH  C  Rjj 

(4)  PH  i  i  PH  =:>  R^^  PH 

(5)  PH  GH  :  GH  PH 

(6)  GH  ^  MGE  (of  same  order) 

These  relations  are  depicted  In  the  Venn  diagram  below. 


I# 

I 
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3.  Closure  Properties 

Closure  properties  guarantee  that  the  result  of  performing 
certain  mathematical  operations  on  members  of  a  class  will  be 
another  member  of  the  same  class.  For  example,  the  set  of  positive 
Integers  Is  closed  under  addition  and  multiplication.  Closure  Is 
useful  In  applications  because  the  outputs  of  certain  processes 
will  share  the  properties  of  the  Inputs.  In  this  section  we 
present  some  closure  properties  of  the  GH  class  of  distributions 
arising  In  probability  modeling,  order  statistics,  reliability 
theory,  and  queueing  theory.  These  properties  provide  additional 
Justification  for  using  GH  distributions  as  approximations  to 
arbitrary  CDFs. 

3.1  Probability  Modeling 

In  probability  modeling,  mixtures  of  distributions  and 
convolutions  of  distributions  frequently  arise.  Neuts  [1975,  1981] 
has  shown  that  the  PH  class  Is  closed  under  convolutions  as  well  as 
under  finite  mixtures.  The  PH  class  Is  also  closed  under  Infinite 
mixtures  In  the  sense  that  an  Infinite  mixture  of  successive 
convolutions  of  a  PH  distribution,  where  the  mixing  parameters 
constitute  a  discrete  PH  distribution.  Is  Itself  PH.  Since  the  GH 
class  consists  of  linear  combinations  of  a  finite  number  of 
exponential  terms.  It  follows  Immediately  that  GH  Is  closed  under 
finite  mixtures.  An  Infinite  mixture  could  Involve  an  Infinite 
number  of  exponential  terms  and  would  then  not  be  In  GH,  by 
definition.  Therefore,  GH  Is  closed  under  Infinite  mixtures  only 
when  the  number  of  distinct  exponential  terms  contained  In  the 
mixture  Is  finite. 

The  GH  class  Is  not  closed  under  convolutions.  This  can 
easily  be  demonstrated  by  noting  that  the  GH  density  f(t) 

■  Xe  convolved  with  Itself  yields  the  Erlang-2  density 

f(t)*f(t)  -  X^te"^*^ 


(3.1.1) 


which  Is  not  In  GH.  While  undesirable,  this  lack  of  closure  under 
convolution  Is  not  of  much  practical  consequence,  however.  By 
virtue  of  the  denseness  results,  we  can  closely  approximate  any 
Erlang  distribution  with  a  GH  distribution. 

The  GH  class  of  distributions  Is  closed  under 


multiplication.  If  and  T2  are  two  random  variables  with 
corresponding  CDFs  F^Ct)  and  F2(t),  then  it  Is  well-known  that  the 
distribution  of  max(Tj,T2)  Is  Fj(t)  •  F2(t).  When  Fj^  and  F2  are 
each  GH,  It  follows  that 


•  F^Ct)  -  (1  -  I  a^e"^!*")  •  (I  -  I  b^ 

^  ^  (3.1.2) 

-  1  -  I  a  e‘^1*^  -  I  b  e““j‘^  +  I  I  a  b  e"^^l 
1  j  ^  1  j  ^ 


which  clearly  Is  also  GH. 


3.2  Order  Statistics 


If  tj,  t2,***,  tjj  represent  the  values  obtained  by  taking  n 
Independent  samples  from  the  same  distribution,  the  corresponding 
order  statistics,  denoted  ^(n)*  obtained  by 

placing  the  samples  In  ascending  numerical  sequence;  that  is, 
t^j^j  <  t^2j  <  •••  <  ^(n)’  applications  it  is  frequently 

required  to  calculate  the  distribution  of  the  maximum  or  minimum  of 
the  sample.  Since  the  probability  that  the  maximum  value  does  not 
exceed  some  value,  say  t,  is  equal  to  the  probability  that  the 
value  of  each  random  variable  Tj,  T2,***,  Tjj  does  not  exceed  t  we 
have 


F*(t)  -  Pr{max  (T  ,  !  ,•••,  T  )  <  t}  -  TT  F  (t)  -  LFCt)]*"  (3.2.1) 

i-1 


where  F*  is  the  distribution  of  the  maximum  of  the  and  F^  *  F  is 
the  distribution  function  of  the  1^^  sample.  It  follows  by 
repeated  application  of  (3.1.2)  that  F*(t)  is  also  GH,  so  that  the 


n^  order  statistic  corresponding  to  a  sample  of  n  observations 
drawn  from  a  GH  distribution  is  also  GH.  From  (3.2.1)  note  that 


.  -  I  'J  'jLi' 


mmmsm 


V- 


this  result  holds  even  if  the  are  not  identically  distributed  as 
long  as  each  one  has  a  GH  distribution. 

The  distribution  of  the  order  statistic,  1  <  k  <  n,  is 
well-known  (see,  for  example,  David  [1981]  or  Guttman  et  al. 

[1982])  and  is  given  by 

F^(t)  -  t}  -  (“)F\t)(l-F(t))"  ■  ^ 

Using  (3.1.2)  repeatedly,  it  follows  that  the  class  GU  is  closed 
under  the  k*-^  order  statistic  for  1  <  k  <  n.  This  is  so  because 
the  above  summation  always  contains  the  term  F'^(t)  which  ensures 
that  unity  will  be  a  term  in  the  expanded  expression  for  F]^. 

GH  is  also  closed  under  certain  functions  of  order  statistics 
such  as  the  difference  between  pairs  of  them.  We  will  Illustrate 
for  the  special  case,  ,  called  the  range  of  the 

sample.  The  well-known  expression  for  this  CDF  is 

H(r)  -  Pr{T^^j-  T^^^  <  r}  -  n  /  f(t)[F(t  +  r)  -  F(t)]°  “  ^dt  . 

For  F  in  GH,  the  Integrand  consists  of  a  linear  combination  of 
exponentials  so  that  H  will  also  be  in  GH. 


3.3  Reliability  Theory 

In  reliability  theory,  one  wishes  to  compute  the  overall 
reliability  of  a  system  in  terms  of  the  reliabilities  of  its 
component  parts.  Of  course,  the  system  reliability  also  depends 
upon  the  way  in  which  the  components  are  connected.  For  example,  a 
series  connection  of  components  will  fall  when  the  first  component 
failure  occurs  while  a  parallel  connection  will  function  as  long  as 
there  is  at  least  one  good  component.  In  general,  systems  consist 
of  complex  structures  having  both  series  and  parallel  arrangements 
of  components  so  that  finding  the  system  reliability  is  difficult. 

For  each  component  in  a  system  we  define  a  binary  function 

^  ^  I  1,  if  component  1  is  functioning 

i  ^0,  otherwise 


For  a  system  consisting  of  n  Independent  components  we  then  define 
a  binary  variable,  as 


.  ^  t  1,  if  the  system  Is  functioning 

*'*1*  *2’  ’  ^n''  '  0,  otherwise 

and  ^  Is  called  the  structure  function  of  the  system.  A  component, 
1,  Is  said  to  be  Irrelevant  If 

for  all  combinations  of  the  variables  Xj,  J  ^  1.  It  is  intuitively 
reasonable  to  expect  a  system  to  contain  only  relevant  components 
and  for  Improvements  In  the  components  to  result  In  equal  or  better 
performance  of  the  system.  These  Ideas  are  formalized  In  the 
following  definition  of  a  coherent  system  (see  Barlow  and  Proschan 
11981]). 


Definition;  A  system  of  components  is  said  to  be  coherent  If 
a)  Its  structure  function  Is  monotonlc,  l.e., 
(|i(x^,...,x^)  >  <Ky^,...,yjj)  for  x^^  >  (3.3.1) 

and 


b)  It  contains  no  Irrelevant  components. 

Now  suppose  that  each  component  has  an  associated  lifetime 
distribution  that  gives  the  probability  of  the  component 
functioning  at  time  t.  That  Is,  let  F|^(t)  ■>  Prfcomponent  1  falls 
prior  to  t}  so  that  1  -  F^(t)  *  Pr{ component  1  Is  functioning  at 
time  t}.  We  will  now  show  that  the  GH  class  of  CDFs  Is  closed 
under  the  formation  of  coherent  structures.  That  Is,  If  each 
component  has  a  GH  distribution  of  lifetime  then  any  coherent 
structure  formed  from  those  components  will  also  be  characterized 
by  a  GH  distribution  of  lifetime.  Assaf  and  Levikson  [1982]  have 
shown  a  similar  result  for  phase-type  distributions.  Because  of 
their  simple  structure.  It  Is  easier  to  demonstrate  this  result  for 
GH  distributions. 
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Let  S  ■  .  .,x^)  I  x^  ■  0  or  x^  ■  1,  i  ■  1,2,  be  the 

state  space  of  the  system.  He  denote  the  2"  states  in  S  by  Sj^, 
S2i***>  S2n.  By  the  law  of  total  probability 


Pr{system  Is  functioning}  »  Pr{+(Xj^,*«»,Xjj^)  -  l} 

-  I  Pr{()>  -  lls^}  •  Pr{s^} 
Sj^e  S 


(3.3.2) 


For  each  state,  the  value  of  ^(Sj^)  la  known  since  the  state 
determines  whether  the  system  Is  functioning  or  not.  The 
corresponding  probabilities  are  therefore  either  zero  or  one.  That 
Is,  Pr{<fr  -  l|s^}  Is  zero  or  one.  Since  the  components  are  assumed 
to  be  Independent,  Pr{s^}  Is  Just  the  product  of  the  probabilities 
that  each  component  Is  functioning  or  not,  depending  upon  whether 
the  corresponding  value  of  Xj  Is  one  or  zero  for  the  state  In 
question.  For  example.  In  a  three  component  system  with  S2  - 
(1,0,1),  we  have  Pr{a2}  ■  (1  -  F]^(t))(F2(t))(l  -  F3(t)).  For  a 
coherent  system,  'fr(0,»»»,0)  -  0,  for  otherwise  (^  =  1  and  all 
components  would  be  Irrelevant.  The  probability  of  the 


corresponding  state,  P(0,0,«»»,0)  »  1  f  Fj^(t)  will  therefore  not 

1-1 

appear  In  expression  (3.3.2)  which  will  consist  of  a  sum  of 
products  of  the  form  (1  -  F  (t)]  •  ]~f  F.(t)  where  1  ranges 

i  j  ■’ 

over  all  values  for  which  x^  *  1  and  J  ranges  over  all  values  for 
which  Xj  -  0.  Since  the  component  distributions  are  GH  of  the  form 

Fj(t)  -  1  “  it  follows  that  (3.3.2)  will  be  a 

j  ^ 

linear  combination  of  exponentials,  so  that  the  lifetime  CDF  of  the 
system  will  also  be  GH.  We  illustrate  with  a  two-component  system. 


Example  3 . 3 

Let  F^(t)  -  1  -  (l/3)e"‘  -  (2/3)e"2t  and  F2(t)  -  1  -  2e"*^  + 


iw 


We  therefore  obtain  from  (3.3.2) 


Pr{System  is  functioning}  ■  -  Fj^)(l  -  F2)  +  ♦(1»0)(1  -  Fj^)(F2) 

+  *(0,l)(Fi)(l  -  F2) 

-  ♦(l,l)[(2/3)e"2t  +  -  (2/3)e"^*^]  (3.3.3) 

+  ♦(l,0)l(l/3)e**^  -  e"3t  +  (2/3)e-^t] 

+  ♦(0,l)t2e"*^  -  (5/3)e"2t  -  +  (2/3)e"^*^l. 

For  a  coherent  system,  (^(1,1)  >■  1.  There  are  two  possibilities: 
the  components  are  connected  in  series  so  that  (^(1,0)  -  ^(0,1)  ~  0 
or  they  are  connected  in  parallel  with  ^(1,0)  •  ^(0,1)  “  1.  The 
corresponding  results  from  (3.3.3)  are: 

a)  series  connection: 

Prjsystem  functions}  *  (2/3)e  +  e  -  (2/3)e 

so  that  the  lifetime  distribution  of  the  system  is 
Fs(t)  -  1  -  (2/3)e-2t  -  +  (2/3)e-^*=. 

b)  parallel  connection: 

Fs(t)  -  1  -  (7/3)e“‘^  +  -  (2/3)0“^*^. 

The  closure  result  derived  above  applies  to  any  coherent 
structure  however  complex  and  includes  arbitrary  series-parallel 
arrangements  such  as  k  out  of  n  systems  as  well  as  bridge 
structures. 

3.4  Queueing  Theory 

The  M/G/1  queue  is  characterized  by  exponential  interarrival 
and  general  service-time  distributions.  Here  we  will  examine  the 
nature  of  the  steady-state  residual  service  times  and  queueing 
times  when  the  service  time  distribution  is  GH. 

The  residual  service  time  is  the  remaining  service  time  of  the 
customer  lu  service  at  the  Instant  a  new  customer  arrives  and  the 
waiting  time  (l.e.,  queueing  time)  is  the  time  an  arriving  customer 
must  wait  before  receiving  service.  Denoting  the  GH  service  time 


distribution  by  G(t)i  It  follow  from  renewal  theory  (see  Ross 
[1970])  that  the  distribution  of  the  residual  service  time  Is  given 

by  t 

R(t)  -  W  /  [1  -  G(x)]  dx  (3.4.1) 

0 

where  1/u  Is  the  mean  of  G(t).  Since  G  Is  GH,  this  Integral  Is  of 

the  form  ^  ^ 

U  I  (1  a.e“^l*)  dx  -  u  I  a.  /  e  ^1*  dx. 

0  0 

Therefore, 

R(t)  -  1  -  I  e"\*=  -  1  -  I  b  e"^*^ 

Af  1 

so  that  R(t)  Is  also  GH. 

A  simple  relationship  exists  between  the  Laplace  transforms  of 
R(t)  and  the  waiting  time  distribution,  W(t).  Denoting  the 
Laplace-Stleltjes  transform  of  F(t)  by  F*(s),  Gross  and  Harris 
[1985]  have  shown  that 

W*(s)  -  - (3.4.2) 

1  -  PR*(8) 

where  p  ~  X/p  Is  the  ratio  of  the  average  arrival  and  service 
rates.  By  expanding  the  right-hand  side  of  (3.4.2)  In  a  geometric 
series  and  taking  inverse  transforms  It  follows  that 

CO 

W(t)  -  (1  -  P)  I  p"[R^"^t)]  (3.4.3) 

0 

where  R^“^(t)  denotes  the  n-fold  convolution  of  R(t)  with  Itself. 
Since  we  have  shown  that  GH  Is  not  closed  under  convolution.  It 
follows  that  (3.4.3)  need  not  be  GH.  We  will  demonstrate  this  fact 
with  an  example. 

Example  3.4 

Let  G(t)  -  1  -  3e'*^  +  3e~^‘^  -  e"^*^ 
has  Laplace-Stleltjes  transform 


with  mean  l/p  ■  11/6.  This 


Corresponding  to  this, 


R*(8) 


W  [1  -  G*(8)]  _  11(8  +  6s  +  11) 

S  “  <8  +  1)(8  +  2)(8  +  3) 


which  is  seen  to  be  GH  by  expanding  in  partial  fractions. 
Computing  W*(8)  using  (3.4.2)  yields 


W*(8)  -  I 


1  -  p)  8(8  4-  1)(8  +  2)(8  +  3) 

8  +  DCs  +"2)(8  +'3)T8”\T+TX  • 


(3.4.4) 


To  have  a  steady-state  queue,  X  must  be  less  than  p  (6/11  in  this 
case).  Letting  X  >  1/2,  the  denominator  of  (3.4.4)  becomes 


8(8^  +  11/2  8^  +  88  +  1/2). 


The  cubic  expression  in  parentheses  has  a  real  root  at 
approximately  -.0634  so  this  expression  can  be  written  in  factored 
form  as 


8(s  +  .0654)(s2  +  5.4346  s  +  7.6445). 

The  discriminant  of  the  quadratic  term  is  less  than  zero,  so  that 
the  quadratic  has  complex  roots.  Therefore  W*(s)  has  complex  roots 
which  implies  W(t)  is  not  in  GH,  although  from  (3.4.4)  it  is 
clearly  in  R^.  We  note  from  (3.4.3)  and  the  fact  that  p  <  1  for 
equilibrium  that  U(t)  consists  effectively  of  a  finite  sum  of 
exponential  and  Erlang  terms.  Since  the  Erlangs  can  be  well 
approximated  by  GH  distributions,  from  a  practical  viewpoint  the 
waiting  time  distribution  can  be  treated  as  GH  even  though, 
strictly  speaking,  we  have  seen  that  it  is  not. 

By  way  of  comparison,  we  note  that  Neuts  [1975,  1981]  has 
shown  that  if  the  service-time  distribution  is  phase-type  then  both 
R(t)  and  W(t)  are  also  phase-type. 


4.  Numerical  Inversion  and  Random  Variate  Creation 

In  applications  It  Is  often  necessary  to  Invert  a  probability 
distribution  function;  that  Is,  given  a  value  for  F(t),  find  the 
corresponding  value  of  t*  For  all  but  the  simplest  COFs  an 
explicit  Inverse  cannot  be  found*  One  must  then  resort  to 
numerical  techniques,  which  may  be  particularly  complicated  when  a 
CDF  type  cannot  be  transformed  back  to  a  single  standard  form.  A 
related  (and  occasionally  Identical)  problem  Is  the  derivation  of 
random  variates  following  a  specific  distribution.  The  Inversion 
can  Itself  be  used,  but,  as  Is  typical,  there  Is  a  more  efficient 
way  to  generate  the  variates  by  using  the  distribution's  properties 
In  a  more  direct  way. 

4.1  Inversion 

Many  techniques  exist  for  solving  for  the  (unique)  root  of 
F(t)*p,  Including  the  bisection,  false  position,  secant,  and 
Newton-Raphson  methods.  These  and  other  techniques  may  be  found  In 
standard  works  on  numerical  analysis  such  as  Conte  and  de  Boor 
[1980],  King  [1984],  Ralston  [1965]  and  Hamming  [1971].  The 
techniques  differ  In  their  rates  of  convergence,  complexity  or 
computational  efficiency,  and  the  local  or  global  nature  of  their 
convergence.  In  practice,  tradeoffs  must  be  made  between  these 
characteristics  In  order  to  choose  the  method  most  appropriate  for 
the  application  at  hand. 

All  of  the  methods  listed  above  are  general  In  that  they  can 
be  used  to  find  the  roots  of  any  function.  We  now  propose  a  method 
for  Inverting  GH  CDFs  based  upon  their  underlying  structure,  that 
Is  guaranteed  to  converge.  Is  simple  to  Implement,  and  Is  free  of 
the  drawbacks  of  some  of  the  standard  techniques.  Its  chief 
disadvantage  appears  to  be  that  convergence  Is  only  linear.  We 
will  describe  the  method  first,  and  then  briefly  compare  its 
features  with  those  of  the  other  techniques. 

Our  method  belongs  to  the  class  of  so-called  "fixed-point" 
Iterations  of  the  form  *  g(x^).  We  begin  with  the  CDF 


F(t)  -  1  -  ^  e  bj  e  “j  ,  where  X^,  bj,  Oj  >  0* 

Given  a  particular  value  F(T)  we  wish  to  determine  T.  We  first 
define  two  additional  functions, 

G(t)  -  F(T)  +  I 

and  (A.l) 

H(t)  -  1  +  I  bj  e’“j^  . 

Therefore  G(t)  -  H(t)  ■  F(T)  -  F(t).  The  desired  value,  T,  is  then 
the  abscissa  of  the  Intersection  of  the  curves  G(t}  and  H(t)> 

Since  F(t)  is  monotonlc,  there  is  only  one  such  intersection. 
Furthermore,  G  and  H  are  each  monotonlcally  decreasing  and  have  the 
following  properties: 


and 


G(t)  >  H(t)  for  t  <  T 
G'(t)  -  H'(t)  «  -F'(t)  <  0  for  all  t. 


(4.2) 

(4.3) 


The  second  of  these  relations  follows  from  the  nonnegativity  of  the 
PDF.  Since  G*  and  H'  are  negative  everywhere,  (4.3)  Implies  that 

|G'(t)l  >  lH’(t)|  for  all  t.  (4.4) 


The  proposed  technique  for  finding  T  Is  most  easily  explained  by 
referring  to  the  graphs  of  G  and  H.  These  are  shown  in  Figure  2. 


Figure  2:  Geometry  for  Finding  the  Inverse  of  F 


Starting  from  any  point  t„  <  T,  If  t  Is  Incremented  In  such  a  way 
that  G  remains  greater  than  H,  we  will  still  be  to  the  left  of  T. 
Repeating  this  process,  we  will  converge  to  T  from  below.  We  next 
explicitly  describe  the  procedure  for  Incrementing  t  and  prove 


convergence. 


For  any  t^  <  T,  construct  the  tangent  to  G  through  the  point 
(tjj,  G(tjj)).  By  the  convexity  of  G  (G"  >  0  everywhere),  this 
tangent  line  Is  always  below  the  graph  of  G.  Construct  the 
horizontal  line  through  the  point  (t^,  HCt^)).  By  the  monotonlclty 


of  H,  this  line  lies  above  H  for  all  t  >  t^^.  Since  G'  <  0  and 


GCtjj)  >  H(tg),  the  tangent  to  G  and  the  horizontal  line  Intersect 
at  a  point  whose  abscissa  Is  greater  than  t^.  Let  this  value  be 
**0+1  *  the  process  beginning  at  t^^^.  The  sequence  of  steps 

Is  Illustrated  In  Figure  3. 
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Figure  3:  Iterative  Schene  Illustrating  Convergence. 

Since  the  point  of  Intersection  of  the  tangent  and  horizontal  lines 
lies  on  both  lines  we  have 

G(t^+i)  >  H(trt)  >  H(tn+i).  (4.5)  • 

From  the  above  construction  we  obtain  an  explicit  expression  for 
*’n-M  ^  follows: 


a<^n>  -  ^<^n> 

‘n+l  ■  ‘n 


-  G-(t„) 


or 


t  . ,  ■  t  “ 
n+l  n 


G(t„)  -  H(t„) 


(4.6) 


We  next  prove  that  the  sequence  {t^}  converges  to  T.  Since  t^^  <  T, 

G(t,^)  -  H(tjj)  >  0.  From  the  definition  of  G  In  (4.1), 

^  **X  c 

G'(t)  •  ~  I  ^  seen  to  be  negative  and  finite  for  all 


values  of  t.  Therefore,  from  (4.6)  we  have  that  t^^j^  >  t^  so  that 
the  sequence  {t^}  is  monotonlcally  Increasing.  From  (4.5)  and 
(4.2),  {tj^l  is  bounded  above  by  T.  Therefore,  {t^}  has  a  limit, 
tg.  Using  this  fact  together  with  the  continuity  of  G,  H,  and  G', 
and  taking  the  limit  of  (4.6)  yields 


Therefore , 


t.  ■  lim  t  . , 
o  n+l 


lim  t  - 
n 


lim  G(tj^)  -  lim  H(t^) 


lim  G'(t  ) 
n 


^o  - 


G(lim  t  )  -  H(lim  t  ) 
n  n 


G'dim  t  ) 
n 


G(0  -  H(t„) 


*^o  - 


(4.7) 


G(to)  -  H<to) 


since  G'(t^)  *  0  for  t^  finite,  this  implies  that  G(tQ)  -  H(tQ). 

Since  G  and  H  have  a  unique  point  of  intersection,  t^  ~  T,  and  the 

sequence  {t^}  converges  to  T,  as  desired. 

Fixed-point  algorithms,  in  general,  have  only  linear 

convergence  and  we  now  demonstrate  this  fact  for  our  algorithm. 

Denoting  the  error  in  the  n^**  Iterate  by  e  ■  t  -  T,  we  write 

n  n 

(4.6)  as 


As  tjj  nears  T,  G'(tjj)  is  approximately  equal  to  G'(T).  Using  the 
first  two  terms  of  the  Taylor  series  expansions  for  G  and  H  about 
T,  (4.8)  becomes 


'n+1 


=■  e 


G'(T)  -  H'(T) 
"  G’(T) 


T) 

fy 


Ke 


(4.9) 


This  Illustrates  the  linear  nature  of  the  convergence  of  {t^}  • 

From  (4.4)  we  see  that  e  <  e  except  possibly  when  G'(T)  ■ 

n'Ti  n 

H'(T).  In  that  case  F'(T)  -  0  so  that  the  solution  corresponds  to 
a  multiple  root.  To  obtain  an  expression  for  under  these 

conditions,  more  terms  must  be  used  In  the  Taylor  series  expansion 
In  (4.8).  With  a  triple  root,  we  obtain 


G"’(T)  -  H'"(T)  ®n 
‘'n+l  “  ^n  ■  - -  JT  * 


(4.10) 


Since  G(t)  -  H(t)  -  F(T)  -  F(t),  G"'(T)  -  H’"(T)  -  -  F'"(T).  At 
a  triple  root  F"(T)  ■  F'(T)  ■  0  so  that  F''*(T)  must  be  positive 
to  prevent  the  PDF  F'(t)  from  becoming  negative.  As  a  result 

G"'(T)  -  H*'*(T)  .  c  >  0 
G'(T) 

since  both  the  numerator  and  denominator  are  negative.  Therefore, 
from  (4.10), 


e  *  e  -  ^ 
n+1  n  3 !  n 

so  that  the  error  sequence  Is  still  decreasing  but  at  an  ever 
diminishing  rate.  In  a  situation  such  as  this  with  a  multiple 
root,  even  the  Newton-Raphson  algorithm  slows  from  a  quadratic  to  a 
linear  rate  of  convergence. 

Our  algorithm  will  converge  to  T  when  started  at  any  value  of 
t  <  T.  In  particular,  we  can  take  t  -  0  Initially.  Like  other 
globally  convergent  algorithms  such  as  bisection  or  false  position, 
convergence  Is  linear.  With  the  false  position  algorithm, 
however,  as  the  solution  Is  neared,  a  problem  with  step  size  may 


occur  since  the  denominator  of  the  step  expression  consists  of  the 
difference  of  nearly  equal  numbers.  Our  algorithm  does  not  have 
that  problem  since  the  denominator  of  the  step  term,  G',  Is  non~ 
zero  and  only  approaches  zero  when  F(T)  Is  very  close  to  unity. 

Near  the  solution  point,  the  bisection  and  false  position 
algorithms  can  suffer  from  underflow  since  the  product  of  two  small 
numbers  Is  used  to  determine  the  truncated  Interval  containing  the 
solution.  Also,  each  of  these  algorithms  requires  that  a  pair  of 
values  bracketing  the  root  be  supplied  to  start  the  search 
procedure.  A  separate  computation  will  usually  be  needed  to 
produce  such  a  pair  of  values.  A  slight  drawback  of  our  algorithm 
relative  to  the  bisection  algorithm  Is  that  the  solution  Is 
approached  from  below  so  that  we  do  not  get  an  estimate  of  the 
error  at  each  stage  of  the  Iteration. 

Locally  convergent  methods,  such  as  Newton-Raphson  and  secant, 
converge  faster  than  our  algorithm.  However,  convergence  Is  not 
guaranteed  unless  the  starting  point  Is  reasonably  close  to  the 
solution.  For  example,  Che  Newton-Raphson  method  may  oscillate 
between  two  values  and  If  an  Iteration  yields  a  negative  value  of  t 
or  a  zero-valued  derivative.  It  Is  not  clear  how  to  proceed,  short 
of  picking  a  new  Initial  point  and  beginning  again.  The  secant 
algorithm  can  also  produce  negative  values  of  t  when  both  points  of 
the  current  Iteration  are  to  the  right  of  the  solution  and  the  CDF 
has  a  small  slope  In  their  vicinity.  In  addition,  the  secant 
method  may  suffer  from  roundoff  error  In  the  computation  of  step 
size  resulting  from  a  division  by  the  difference  of  nearly  equal 
quantities.  This  situation  can  occur  when  both  points  lie  on  the 
same  side  of  the  solution.  These  drawbacks  offset  the  faster 
convergence  of  the  locally  convergent  methods. 

In  practice,  one  often  starts  with  a  globally  convergent 
method  and  switches  to  a  more  rapidly  converging  local  method  as 
the  solution  Is  neared.  The  switch  to  the  locally  convergent 
algorithm  can  be  made  when  the  step  size  using  the  Initial 
Iteration  method  falls  below  some  preset  threshold  value.  Such  a 
hybrid  scheme  could  be  Implemented  by  starting  with  our  algorithm 


and  then  switching  to  Newton-Raphson.  This  would  be  particularly 
easy  since  (4.6)  Is  almost  Identical  to  the  Newton-IUphson 
algorithm.  Indeed,  by  simply  changing  the  denominator  to  C'Ct^)  ~ 
H'(tn),  we  obtain  the  Newton-Raphson  Iteration  for  finding  the  root 
of  F(T)  -  F(t)  ■  0,  which  Is  the  desired  equation.  Finally,  we 
note  that  schemes  exist  for  accelerating  the  convergence  of  linear- 
rate  algorithms.  One  such  technique  Is  Altken's  delta-squared 
process  (see,  for  example,  Ralston  [1965]).  However,  It  is 
generally  Just  as  effective  to  use  a  hybrid  scheme  as  described 
above . 

4.2  The  Generation  of  Random  Variates 

The  decomposition  of  a  generalized  hyperexponential 
distribution  into  "positive"  and  "negative”  functions  similar  to 
(4.1)  allows  an  Interesting  application  of  the  acceptance-rejection 
method  to  generate  random  variates  for  simulation.  Our  approach  Is 
a  modification  of  the  work  of  Bignaml  and  de  Mattels  [1971],  also 
discussed  In  Everltt  and  Hand  [1981]. 

Let  the  PDF  from  which  samples  are  desired  be  given  by 

m  n 

f(x)  *  I  Pif.(x)  -  I  q.g.(x)  (4.1.1) 

1-1  J-1  J  J 

where  f^  and  gj  are  PDFs,  pj^,  qj  >  0  and  "  I  q^  -  !• 

Rewrite  (4.1.1)  as 

f(x)  +  I  qj8j(x)  •  I  Pifj^(x)*  (4.1.2) 

Divide  through  by  1  +  I  qj^  -  I  to  obtain 

-  f(x)  +  I  - - —  g/x)  “  I  -  f,(x)  .  (4.1.3) 

1  +  ^  rp. 

Both  the  left  and  right  sides  of  (4.1.3)  now  are  legitimate 
mixtures.  Suppose  a  value  of  x  is  selected  from  the  mixture  on  the 
right  side  of  (4.1.3).  Since  this  observed  value  is  compatible 
with  the  density  shown  on  the  left  of  (4.1.3),  It  can  be  viewed  as 
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arising  from  either  f  or  one  of  the  gj.  The  posterior  probability 
that  It  came  from  f  can  be  found  from  Bayes*  rule  to  be 


f(x)  +  I  ‘lj8j<*)  I 


(4.1.4) 


This  suggests  that  If  the  selected  value  Is  accepted  with 
probability  given  by  (4.1.4),  then  the  probability  density 
governing  the  accepted  value  will  be  f(x)  as  desired.  To  summarize 
the  procedure: 

1.  Generate  y  from  the  mixture  (Pj^/^P^)^ j^(y) • 

2.  Generate  u  from  U(0,1). 

3.  Compute  t  -  f(y)/(^Pifj^(y))- 

4.  If  u  <  t  set  X  *  y;  otherwise  discard  y,  return  to 
Step  1  and  repeat  the  process  until  a  value  of  x  Is 
selected. 

The  discussion  above  Is  rather  heuristic.  We  now  prove  In  a 
rigorous  fashion  that  the  accepted  values  come  from  the  desired 
distribution.  Make  the  following  definitions: 

I  Pi  »  1/k  , 

l-l 

1  =  w(x),  and  (4.1.5) 

r  Pi 

I  -  f.(x)  »  kw(x)  =  r(x)  .  (4.1.6) 

'Pi 

Further  denote  the  event  that  a  value  of  Y  Is  accepted  by  A.  We 

show  that  the  distribution  of  the  accepted  values  Is  given  by 

x 

Pr{x  <  x}  ■  F(x)  ■  /  f(z)dz.  Observe  that  the  values  of  X  are 

a  subset  of  the  values  of  Y  and  that  conditioned  on  A,  values  of  X 
and  Y  are  the  same.  That  Is 


Pr{x  <  x|  -  Pr{Y  <  x|a|  . 


(4.1.7) 
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By  the  definition  of  conditional  probability,  the  right  side  of 
(4.1.7)  becomes  <  x,a} 


Pr{Y  <  x|a} 


Pr{A} 


(4.1.8) 


Next  evaluate  the  expressions  in  the  numerator  and  denominator. 

From  the  procedure  for  generating  Y  and  (4.1.6)  we  have  that  the 
PDF  of  Y  is  r(y).  We  obtain  an  expression  for  Pr{A}  by  integrating 
the  joint  distribution  of  A  and  Y  over  the  domain  of  Y,  as  follows: 


Pr{A}  -  /“  Pr{A,y}  dy  -  /“  Pr{A|Y  -  y|r(y)dy  .  (4.1.9) 


From  the  acceptance  procedure  and  (4.1.5), 


Pr{A|Y  =■  y}  -  Pr{u  <  f(y)/w(y)}  -  f(y)/w(y).  (4.1.10) 


Substituting  (4.1.10)  into  (4.1.9)  yields 


Pr{A}  -  r  [f(y)/w(y)Jr(y)dy  -  /*  kf(y)dy  -  k,  (4.1.11) 


recalling  that  r(y)  »  kw(y).  Next  evaluate  the  numerator  of 
(4.1.8).  Again,  Integrate  the  Joint  distribution  over  y: 


Pr{Y  <  x,A}  -  /"  Pr{Y  <  x,a1y  -  :  |r(y)dy 


/  Pr{Y  <  x,A|Y  -  y}r(y)dy  , 


(4.1.12) 


where  the  second  equality  follows  from  the  fact  that  the 
event  {y  <  x}  has  zero  probability  for  Y  ~  y  >  x.  By  the  same 
token,  the  event {y  <  x}  is  certain  idien  conditioned  on  Y  ■  y  <  x  so 
(4.1.12)  becomes,  with  the  help  of  (4.1.10), 

X 

PrjY  <  x,A}  -  /  PrlAlY  -  y}r(y)dy 

X 

•  /  [f(y)/w(y)]r(y)dy  (4.1.13) 
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-  k  /*  f(y)dy  , 

with  the  last  equality  following  from  (4.1.6).  Finally, 
substituting  (4.1.11)  and  (4.1.13)  Into  (4.1.8)  yields 

Pr{Y  <  x|a}  -  f(y)dy  , 

and  this  together  with  (4.1.7)  establishes  the  desired  result. 
This  proof  Is  based  on  that  given  on  p.  273  of  Law  and  Kelton 
[1982]  for  the  general  case  of  acceptance-rejection  generation  of 
random  variates  distributed  according  to  any  continuous 
distribution. 


5.  Conclusion 

Among  the  areas  of  future  vork  that  present  themselves,  an 
obvious  and  practical  question  Is  that  of  deciding  how  many  terms 
to  Include  In  the  GH  mixture.  This  Issue  Is  related  to  the 
denseness  and  Identlf lability  notions.  However,  no  general  formal 
procedure  Is  found  In  the  extant  literature.  Experience  cited  by 
Harris  and  Sykes  [1986]  suggests  that  a  relatively  small  number  of 
terms  Is  often  quite  adequate  for  fitting  raw  data.  However, 
definitive  guidelines  for  determining  the  precise  number  of  terms 
are  needed. 

This  study  has  Introduced  the  family  of  generalized 
hyperexponentlal  (GH)  distributions  and  demonstrated  desirable 
properties  of  this  class  which  make  them  attractive  as 
approximations  to  general  COFs.  Among  the  features  noted  In  this 
work  are: 

1.  The  GH  class  Is  dense  In  the  set  of  all  CDFs  defined 

on  [0,<*)  so  that  a  GH  CDF  may  be  found  that  Is  as  close  as 
desired  (with  respect  to  a  suitable  metric)  to  any 
specified  CDF. 

2.  GH  distributions  have  a  simple  mathematical  structure  that 
facilitates  such  operations  as  differentiating. 
Integrating,  and  taking  Laplace  transforms.  These 
operations  frequently  occur  In  application  areas  where  GH 
approximations  can  be  used. 

3.  GH  distributions  have  rational  Laplace  transforms  and  are 
therefore  Coxian. 

4.  A  GH  distribution  has  a  unique  representation  as  a  linear 
combination  of  exponential  terms.  This  property  Is  useful 
both  for  Identification  of  a  CDF  as  a  member  of  GH  and  for 
such  statistical  procedures  as  parameter  estimation. 

5.  Since  GH  CDFs  are  composed  of  exponentials,  the  vast 
literature  dealing  with  exponential  and  hyperexponentlal 
distributions  Is  relevant  and  techniques  for  these  common 
distributions  can  often  be  adapted  for  use  with  GH 
distributions. 
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6.  A  computer-based  maximum  likelihood  estimation  algorithm 
exists  for  fitting  a  GH  distribution  to  sample  data.  This 
provides  a  practical  advantage  for  modeling  with  GH 
distributions. 

7.  A  simple  globally  convergent  numerical  algorithm  exists  for 
Inverting  a  GH  distribution.  Being  able  to  invert  a  CDF  Is 
Important  for  statistical  applications,  such  as  calculating 
distribution  percentiles. 

8.  The  GH  class  of  CDFs  Is  closed  with  respect  to  operations 
occurring  In  a  diverse  collection  of  application  areas 
Including  basic  probability  and  statistics,  reliability 
theory,  and  queueing  theory. 


V'Vi***  Vi* 
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